CN-Celeb: Multi-genre speaker recognition

نویسندگان

چکیده

Research on speaker recognition is extending to address the vulnerability in wild conditions, among which genre mismatch perhaps most challenging, for instance, enrollment with reading speech while testing conversational or singing audio. This leads complex and composite inter-session variations, both intrinsic (i.e., speaking style, physiological status) extrinsic recording device, background noise). Unfortunately, few existing multi-genre corpora are not only limited size but also recorded under controlled cannot support conclusive research problem. In this work, we firstly publish CN-Celeb, a large-scale corpus that includes in-the-wild utterances of 3000 speakers 11 different genres. Secondly, using dataset, conduct comprehensive study phenomenon, particular impact challenge performance gain when new dataset used training.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker recognition in a multi-speaker environment

We discuss the multi-speaker tasks of detection, tracking, and segmentation of speakers as included in recent NIST Speaker Recognition Evaluations. We consider how performance for the two-speaker detection task is related to that for the corresponding one-speaker task. We examine the effects of target speaker speech duration and the gender mix within test segments on results for these tasks. We...

متن کامل

Connectionist Architectures for Multi-Speaker Phoneme Recognition

We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modular designs leads to a highly modular multi-network ...

متن کامل

Multi-speaker Recognition in Cocktail Party Problem

This paper proposes an original statistical decision theory to accomplish a multi-speaker recognition task in cocktail party problem. This theory relies on an assumption that the varied frequencies of speakers obey Gaussian distribution and the relationship of their voiceprints can be represented by Euclidean distance vectors. This paper uses Mel-Frequency Cepstral Coefficients to extract the f...

متن کامل

Classifier ensembles for genre recognition

Previous work done in genre recognition and characterization from symbolic sources (monophonic melodies extracted from MIDI files) have pointed our research to the use of classifier ensembles to better accomplish the task. This work presents current research in the use of voting ensembles of classifiers trained on statistical description models of melodies, in order to improve both the accuracy...

متن کامل

Speaker Recognition

Speaker recognition is basically divided into two-classification: speaker recognition and speaker identification and it is the method of automatically identify who is speaking on the basis of individual information integrated in speech waves. Speaker recognition is widely applicable in use of speaker’s voice to verify their identity and control access to services such as banking by telephone, d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Speech Communication

سال: 2022

ISSN: ['1872-7182', '0167-6393']

DOI: https://doi.org/10.1016/j.specom.2022.01.002